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We present a new decoding protocol to realize transmission of classical informa¬ 
tion through a quantum channel at asymptotically maximum capacity, achieving the 
Holevo bound and thus the optimal communication rate. At variance with previous 
proposals, our scheme recovers the message bit by bit, making use of a series “yes-no” 
measurements, organized in bisection fashion, thus determining which codeword was 
sent in log 2 N steps, N being the number of codewords. 
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I. INTRODUCTION 


One of the main achievements in quantum information theory has been the development 
of a generalization of Shannon’s theory for quantum communicatiorP. In particular, the 
Holevo bound 2 ^ sets a limit on the rate of reliable transmission of classical information 
through a quantum channel, which is also achievable in the asymptotic limit of infinitely 
long sequences 4 15 . Consequently, via proper optimization and regularization^, it provides 
the quantum analog of the Shannon classical capacity formula. 

The original proo^ 4 - was carried out by extending to the quantum regime the concept of typ¬ 
ical subspaces used in Shannon communication theorj? 718 . A crucial point is the choice of a 
proper POVM which allows Bob to identify the right message with small error probability. 
The first explicit detection scheme used in this context is a one-step collective-measurement 
POVM known as Pretty Good Measurement (PGM) 56 , highly effective theoretically but not 
easily realizable in practice. 

Following the proof of Ogawa and NagaokaP 10 , Hayashi and Nagaoka 11 ? which establishes 
a connection with the quantum-hypotesis-testing problem? 9 , the possibility of asymptot¬ 
ically achieving the bound through a series of “yes-no” projective measurements was 
investigated 13 15 . This sequential protocol checks whether the received state resides in 
the typical subspace of a given codeword, for each codeword in the code, until it receives a 
positive answer or else declares failure. The “yes-no” question is asked, for each codeword, 
by applying the projector on its typical subspace and thus makes the decoding protocol 
more suited for practical implementations than the PGM. Indeed a design for an explicit 
and structured optical receiver was propose d 20 * 2 ? which used this protocol, with applica¬ 
tions both to optical communication and quantum reading. In particular, for a lossy bosonic 
channel 22 (a model most commonly used to represent realistic fiber and free-space commu¬ 
nication) it was shown that the sequential decoder can be built with gaussian displacement 
operators and vacuum-or-not measurements 15 19 23 . An alternative, near-explicit approach, 
for capacity-achieving classical-quantum communication was also recently developed by 
Wilde and GuhaP*, adapting to the quantum scenario the classical polar coding introduced 
by Arikarf 25 ( In particular, making use of optimal Helstrom measurements in the quantum- 
hypotesis-testing procedure and of Sen’s non-commutative union bound 1 ^, they proposed 
an encoding technique which realizes channel polarization and consequently introduced a 
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quantum successive cancellation decoder. Later work modified such decoding strategy to 
obtain a partially non-collective measurement 26 and extended polar coding to private and 
quantum communication through arbitrary qubit channel^®. The relevance of this ap¬ 
proach is associated with the fact that, at variance with other proposal :! 4 * 5 * 13 ^ , it allows 
optimal decoding with a linear (in the amount of bits) number of collective measurements. 

In this paper we propose a bisection decoding scheme for classical communication through 
a quantum channel and show that it achieves the maximum capacity in the asymptotic limit 
of infinitely long codewords, providing yet an alternative proof of the attainability of the 
Holevo bound. While being inspired to the sequential decoding algorithmpHIS analogously 
to Refs. E23 and j21j our scheme exhibits an exponential advantage in the number of mea¬ 
surements which have to be performed in order to recover the message: specifically if the 
sequential method is built on 0(N ) concatenated “yes-no” detections, where N is the num¬ 
ber of codewords, the bisection method only requires log 2 N of such “atomic” steps, thus 
scaling linearly with the number of bits n which one wishes to transmit. We stress however 
that, being our individual detections explicitly many-body operations, at present we have 
no evidence in support of the fact that such advantage could be translated in a decoding 
scheme which is efficient from the computational point of view, i.e. in terms of the number 
of quantum gates one has to apply to the received string of quantum information carriers.A 
similar problem arises also in the case of polar codes (see e.g. Ref. |26j), and it is caused 
by the lack of an explicit implementation (or at least of an estimate of its complexity) of 
the “atomic” steps involved in the two protocols, i.e. the “yes-no” set detections for the 
present method and the Helstrom measurement for polar coding. Still we believe that our 
method can be of some interest as it widens the class of known decoding strategies which 
are asymptotically optimal, increasing hence the chances of identifying at least one which 
is suitable to implementations. In this respect it is also worth noticing that the proposed 
scheme exhibits the nontrivial advantage of gaining a bit of information at each step of 
the procedure, a feature which may be extremely appealing when dealing with faulty de¬ 
coders, as it allows partial identification of the transmitted message even in the presence of 
subsequent detection failures. 

As in all the previous works on the subject, in our derivation we heavily rely on the 
structure of typical projectors, although we need to properly combine them in order to build 
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efficient “yes-no” set measurements which reconstruct the message bit-by-bit by checking, 
at each step of the procedure, whether the received message belongs to one of two possible 
sets of codewords. In a effort to make the paper self-contained, we reproduce a series of 
known results 1 — providing, in some cases, alternative proofs which are explicitly presented 
in the framework which best fits with the proposed approach. 

The paper is organized as follows: we start in Sec. [TT[ where we introduce the notation and 


state the problem in a rigorous way. In Sec. Ill we present some mathematical tools which 


are important to derive our results. In particular Sec. Ill A is devoted to review some basic 


facts about the structure of typical subspaces of a quantum source, while Sec. |III B] discusses 
few Lemmas which allow us to put bounds on the probability of retrieving certain POVM 
outcomes from states which are close to each other. The bisection protocol is introduced 


in Sec. IV, identifying a sufficient condition which ensures it can asymptotically attain 


the Holevo bound in Sec. IV B and presenting three different methods which satisfy this 
condition. Conclusions are finally given in Sec. |Vj 


II. THE PROBLEM: ACHIEVING THE HOLEVO BOUND 

Consider a memoryless quantum communication channel described by a completely pos¬ 
itive, trace preserving (CPT) mapping® 7~ that Alice (the sender of the communication 
scheme) uses to transmit classical messages to Bob (the receiver). Given an alphabet A of 
classical symbols, we define a V-element code C := {jd),... , jU)j as a subset of A n which 
contains N selected n-long strings j := (j \, • ■ ■ , j n ) of elements of A: they represent the 
codewords which are employed by Alice to codify N distinct classical messages. A quantum 
encoding is then realized by assigning a mapping which, given j G A, associates to it a density 
matrix Oj G ©("H) of the quantum carrier that propagates through the channel. Accordingly 
each string j G A n will be represented by the product state crj := a 31 <g)... ® a 3n G ©("H 071 ), 
and received by Bob as 

Pj-=Pji®>--®Pj n , (!) 

where pj := T[(Jj) is the output density matrix corresponding to the input a 3 . In this 
framework each classical code C is associated with a quantum code via the following classical- 


4 








to-quantum correspondence 




?(A0l 


C := { P : 


'(i)) 


> Pj(N) } 


( 2 ) 


the states pj( t ) being those which Bob has to discriminate in order to recover the message 
Alice sent to him while using the code C. For such purpose he will employ a decoding POVM 
of elements 

!*!,■■■ ,Ahv,X 0 = l-X>A , (3) 

whose outcome represents the inferred value of the transmitted message. Specifically for 
l = 1, • - ■ ,N, the operator Xu is associated with the event where Bob assumes that the 
received message is the Ath one, while X 0 is associated with an explicit failure of the 
decoding stage. Accordingly the average error probability of the quantum code C can then 
be computed as 

1 N 1 XL 

Perr(C) [1 — Psucd/)) = 1 — Jy ) P succjty ■ ( 4 ) 

V i=l V 1= 1 


where 


PSUCC (f- ) 


Tr 



( 5 ) 


is the probability that Bob will successfully retrieve the Ath codeword when Alice transmits 
it. 

In the long message limit n —> oo, it has been shown 3 5 * that P err ( C) can be sent to zero 
if the number of messages scales as N = 2 nR , R being the transmission rate of the scheme 
which is bounded by the Holevo theorem. Specifically we must have that 


R < max x({Pj,Pj}) = C Ho i, 


( 6 ) 


where on the right-hand-side the maximization is performed over all possible input ensembles 
{pj, Gj : j G *4} obtained by selecting the state Gj with probability distribution pj, and where 
the Holevo information of the associated output ensemble {pj, Pj = P{Gj)\j G A} is defined 
as 


x({pj,Pj}) ■■= s y^Pjpjj - J2 Pj s( Pj ), ( 7 ) 

by means of the Von Neumann entropy S(p) = —Tr [p log 2 p ]. 

It is known that the inequality (J6| is achievable, in the sense that, for any output ensemble 
S := {p-j , pj ,; j G ^4}, one can identify a set C of iV ~ 2" A{p.j,Pi}) quantum codewords and a 
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decoding POVM ([3]) for which the error probability Q goes to zero as n increases. This can 
be done by exploiting what, in classical information theory, is known as Shannon’s averaging 
trick. The idea is as follows: the ensemble £ can be seen as a source which, when operating 
n times, will produce n-long product states pj of the form ((!]) with probability 

Pj=PhPh---Pjn- (8) 

Therefore iterating N times this operation, £ will be able to generate a code C defined as 
in Eq. (J2]) with probability 

n 

p(Q= n pj«>= n n^. < 9) 

t=l ,-,N i=l,-,Nq=l 

where are the codewords of the classical counterpart C of C. The set S := (C,P(C)} 
defines the statistical collection of the quantum codes one can associate to £ for fixed values 
of N and n. Accordingly, instead of optimizing the total error probability Q of a single 
element of such a set, we can now consider its averaged value with respect to the probability 
P(C), 

1 N 

{ Perr)s : = P(C)P err (C) = 1 — ~ (Psuccity)s > (1^) 

C V i= 1 

the rationale being that if this quantity can be forced to go to zero in the limit n —> oo then 
at least one (actually almost all) code exists in S for which P err .(C) tends to zero in the 
same limit. 

The Erst proof 15 of this fact made use of a single-step decoding POVM ([ 3 ]), known as 
pretty good measurement (PGM) or square root measurement , which is extremely efficient 
from a theoretical point of view but difficult to implement. More recently, a sequential 
decoding scheme has been introduced 1 ®^, which makes use of projective “yes-no” mea¬ 
surements to verify whether the received state corresponds to a certain codeword or not. 
Following an arbitrary ordering of codewords, this question is asked for each of them in turn, 
until either a positive answer is obtained for some j or else a negative answer for all the 
codewords. To some extent the sequential scheme appears to be easier to realize in practice 
as it decomposes the process into a series of simple steps, and indeed several proposals have 
been made for its use in the context of continuos variable communication lines 22 32 34 . Still 
it has a major drawback in its scaling, since an order of N = 2 nR operations is required 
for its application. The protocol presented here is inspired by the sequential decoding but 
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makes use of a bisection method, performing at each step a “yes-no” measurement for a 
set of possible codewords, whose size is progressively halved, allowing Bob to recover the 
transmitted message bit-by-bit. 


III. MATHEMATICAL TOOLS 

This section reviews some basic facts about typical subspaces and presents some inequal¬ 
ities which will be useful in proving the optimality of our decoding scheme. For a complete 
description of the following properties we refer the reader to Refs. (TJ, 12, T5| EH, and 1361 


A. Typical subspaces 


Consider the average state 

P = ^PjPj = ^Qx\e x )(e x \, (11) 

jeA x 

of the quantum source £ := {pj,Pj',j G A} and its spectral decomposition in terms of the 
eigenbasis (|e x )} of % and the eigenvalues {q x }. This induces a classical random variable 
X with probability distribution q x which, on n sampling events, produces the sequence 
x = (xi, ■ ■ • ,x n ) with probability qg = YY^iQxr The classical 5 —typical subspace Tj 1 is 
defined as the subspace of such sequences whose sample entropy differs from the expected 
entropy of the random variable for less than a given quantity 8 > 0: 

T?={x:\H(x)-H(X)\<6}, (12) 


the sample entropy of a codeword being 


- 1 1 -v 

H{x) = - r log 2 q s = --^2 lo g 2 Qxi, 


n 


n 


(13) 


2=1 


i.e. the average information content of the n symbols in the x sequence, while the associated 
Shannon entropy is defined as usual: 


H{X ) = - q x lo §2 Qx = S(p), (14) 

X 

where in the last identity we used the correspondence with the von Neumann entropy func¬ 
tional of the average state p. As a consequence, the 8 —typical subspace 'Htyp °f quantum 


7 


state p is made of all those vectors \eg) whose corresponding classical sequence is S— typical, 
i.e. x G Tj 1 . The projector on this subspace is given by 


P=^2\eg){eg |. (15) 

Similar properties as for the classical typical subspace hold for the quantum one, namely 

Tr [Pp® n ] > 1-ei, (16) 

Tr [P] < 2 n ^ p)+s \ (17) 

2~n[S(p)+5\ p pp®np < P (18) 


for 6i > 0 and n sufficiently large. These properties state respectively that: 

• The quantum state p® n resides with high probability in the 5 —typical subspace of p; 

• The size of the 6 —typical subspace is exponentially smaller than the size of the whole 
space, unless the source is maximally mixed, i.e. S'(p) = log 2 d; 

• The probability distribution of 6 —typical sequences is approximately uniform ~ 

2 ~ nS (p)' 


It is finally important to observe that the parameter e\ entering in Eq. (16) can be linked 
to n via an exponential scaling 3 -^, i.e. e\ = 0(e~ n ), which ensures that for all polynomial 
functions poly(n ) of n one has 


lim poly(n) e± = 0, (19) 

n—>oo 

(see Appendix [A] for details). 

Similar typical subspaces can be identified also for each specific state pj produced by the 
source, i.e. for each codeword in C, by using the notion of conditional typicality. Indeed each 
source state can be seen as a classical-quantum state \j)(j\®Pj and its spectral decomposition 
will be in terms of eigenvectors {|j) <g) |e^)} and eigenvalues {A J y j. This again induces the 
classical random variables J, with probability distribution pj representing the possible states 
emitted by the source, and Y , with conditional probability distribution = p(y\j). The 
classical 5 —conditionally typical subspace is then defined for each n —long sequence j as 


( 20 ) 



where now the entropic quantities are conditional ones, i.e. 


-> 1 -? 1 _ 

H{y\j) = -- i°g 2 K = l °S2 K ( 21 ) 

i=l 

H(Y\J ) = £>ff(y|i) = -5>A>g 2 Ai. (22) 

16-4 

The <5—conditionally typical subspace of quantum codeword state pj is made of all those 
vectors |eh) whose corresponding classical sequence is 5 —conditionally typical, i.e. y G T 3 5 . 
The projector on this subspace is given by 


E K-><4i- ( 23 ) 

y^l 

Given 62 > 0 and n sufficiently large, the following three main properties hold for the 
conditionally typical subspace: 


E 


Pj Tr 


P 3P> 


> l-e 2 , 


(24) 


I>7 Tr 


P; 


< 2 


; n [E jeAPi s (Pi)+ s ] 


(25) 

2 - n [T,jeAPi s (Pj) +s ] Py < P U p.. P-. < 2 -n [E7'£.AP3 5 0ti)~ l5 ] (26) 

where in the first two expressions the average 31 is taken with respect to the joint probability 


of £ introduced in Eq. ( 8 ), while the last inequality applies for all j. As for Eq. (16) 


we stress that the parameter 62 of Eq. (24) can be chosen to have an exponential scaling 


in n which guarantees that the condition (19) holds also in this case. Note finally that the 


conditionally typical subspaces of different codewords are in general not orthogonal, since 
they are built using vectors of two spectral decompositions of the same space 'H® n . 


B. Measurement Lemmas 

We state here some Lemmas which will be used in the rest of the article. They relate 
in various ways quantum states before and after a measurement, with the slight but crucial 
detail that the latter need not be normalized. Formally one can represent them as subnor¬ 
malized density matrices, i.e. positive operators whose trace is smaller than or equal to 
one. 
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Ail explicit proof of the first three Lemmas can be found in Appendix [B] they refer to 
properties of the trace norm, which for a generic operator 6, is defined as = Tr|0| 
with \6\ = Vm being the modulus of 6. The last Lemma instead was proved by Seif 15 
and provides an alternative, useful, way of estimating the error probability of the sequential 
decoding protocol of Refs. T3] and CEO 

Lemma 1. (Measurement on approximately close states) Let p,a be subnormalized density 
matrices. Let E be a positive and less-than-one operator, i.e. 0 < E < 1. Then 

Tr [Ep] > Tr [Ea\ - 2 D(p, a), (27) 

where D(p,a) — \ ||p — cr|| x is the trace distance between p and a. 

Lemma 2. (Gentle operator) Let p be a subnormalized density matrix and E a positive 
and less-than-one operator, i.e. 0 < E < 1. Let also (■■■) denote the average with respect 
to some probability distribution, which p and E may depend on. Suppose that, for some 
1 > e > 0, 

(Tr [Ep]) > 1 - e. (28) 

Then 

(D (vfEpvf^p)) < y/e. (29) 

The two previous lemmas are well known for ordinary density matrices; they can be 
proved also for subnormalized ones by use of the following lemma. 

Lemma 3. (Alternative form of trace norm for subnormalized states) Let uj be a hermitian 
operator (in particular, lo could be a subnormalized density matrix). Then 

llculL = max Tr [Acu]. (30) 

Lemma 4. (Contractivity of trace distance for POVM elements) Let p,a be subnormalized 
density matrices and 0 < E < 1 a positive and less-than-one operator (for example it could 
be a POVM element and/or a projector). Then 

D (EpE, EcrE) < D (p, a). (31) 
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Proof. Consider the expression of the trace norm of a hermitian operator as in Lemma [3] 
and apply it to the LHS of (31): 


2D ( EpE , EaE) = max Tr [A E(p — a)E] 


= Tr [A E(p - a)E] = Tr [A'(p - a)] 
< max Tr (A(p — cr)] = 2D (p, a). 


(32) 

(33) 

(34) 


The second equality follows from explicitly using the operator A which attains the maximum 


in (32). The third equality follows from using the cyclic property of the trace and setting 


A' = EXE. The inequality follows from the fact that also A' is positive and less-than-one. □ 


Lemma 5. (Sens Lemma) Let p be a subnormalized density matrix and Pi,..., Pk orthogo¬ 
nal projectors on subspaces of its Hilbert space. Let also Qi = 1 — Pi be their complementary 
projectors. Then 


Tr [P k ... PipPi ... P k ] > Tr [p] 


A 


T Tr w< 


(35) 


i—1 


IV. THE BISECTION PROTOCOL 
A. Description of the protocol 

In this subsection we introduce our decoding protocol (Fig. [Tj) which, given a density 
matrix extracted from a N = e^-element quantum code C (J 2 J, generated by the source 
£, tries to identify it by using a bisection method. The measurement process comprises of 
u F = nR nested detection events, each aimed to recover one bit of information from the 
transmitted signal. 

As a preliminary step, Bob assigns an ordering of the codewords in C, identifying each 
of them with a unique string of u F bits, k = (Aq, k 2 ,--- , k UF ), e.g. by providing a binary 
representation of their label f. G {1,..., N}. In particular the first bit of the string k identifies 
two distinct subsets of C containing each N/2 codewords: the subset Cq formed by the 
codewords whose corresponding strings start with Aq = 0, and the subset C^ 1 characterized 
by those for which instead Aq = 1. The second bit of the string k is then used to further 
halve Cq 1} and C^ 1 '. Specifically for Aq = 0,1, Cj.,*' 1 is split into the sub-subsets 0 
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/n\ 

and C ki > k2=l which includes the 1V/4 codewords whose bits strings have k\ as first bit and 
k 2 — 0 and k 2 = 1 as second bit, respectively. Proceeding along the same line Bob identifies 
hence a hierarchy of subsets organized in u F sets, the w-th one being composed by 2 n disjoint 
subsets Cjg fc2) ... ku labelled by the indexes /q, k 2 , • • •, k u , and containing each 2“ F u = N/2 U 
codewords. Specifically C^ k ku is the set formed by the codewords whose identifier string 
k admits the value /q as first bit, the value k 2 as second bit, • • •, and the value k u as the 
w-th bit. By construction for all {!,••■ ,u F } they fulfill the identities 


'^ki,k2,— ,feu-i,0 | | ,feu-i,l 

Cj( M ) I I (''(“) 

^1 5^2 5 ’ " — 1 jO ^1 )" ' 1 


0 , 

1) 

V ^fcl,fc2,-” ,fc„_i 


(36) 

(37) 


and the completeness relation 


(“) 


0“ U 0fci,fc 2 ,-,fc u 

kito,— ,fc„e{o,i} 


(38) 


To recover which codeword Alice is transmitting, Bob performs a sequence of u F concate¬ 
nated measurements organized as shown in Fig. [l] The first of these measures is aimed to 
determine the value of the first bit ki of the bit string associated with the transmitted code¬ 
word, i.e. it allows Bob to determine whether the codeword is in the subset Cg 1 ^ or in the 
subset Cj . The exact form of such procedure will be assigned in the following sections where 
three alternative examples of the scheme will be discussed in details: for the moment it is 
sufficient to observe that it can be described as a POVM of elements A^, iVj 1 * associ¬ 
ated respectively to the outcomes k\ — 0 and /q — 1, plus a null term N^ n = 1 — 
associated with the case in which no decision can be made on the value of k \: if this event 
occurs simply Bob declares failure of the decoding procedure and stops the protocol (in the 
first implementation of the scheme we discuss in Sec. |IVB 1 this element is not present, 
which is equivalent to set N^J U = 0). Once /q has been determined, Bob proceeds with 
the second step of the protocol aimed to recover the value of the bit k 2 of the transmitted 
codeword. To this purpose, conditioned on the value of /q e {0,1} obtained in the previous 

step, Bob performs now a new POVM A4 k aimed to determine whether the received code- 

(2) (2) (2,') ( t 2'\ ( 2 ) 

word belongs to ; 0 or to C ki q. Also M. kl is characterized by three elements: N ki0 , N ki \ 

corresponding to the cases k 2 = 0 and k 2 = 1 respectively, and N kinull = 1 — N k] 0 — N k \ 

corresponding to the failure event (again the explicit expressions for these operators will be 

assigned later on). The procedure iterates till Bob either gets a failure event or recovers all 
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the u F bits which identify the transmitted codeword. Specifically, assuming that no failures 
have occurred in the first u — 1 steps yielding the values Ay, Ay, • • •, Ay_i for the associ¬ 
ated bits, at the w-th step Bob performs on the system a POVM M.^k 2 ■■■ K-i e l emer h s 


/y(“) /yw aTir ] ym 

dnu k u k 2 ,-,k u - 

decide whether the received codeword belongs to set ... ku _ ll0 or to 

Given the above construction the probability of recovering a given string of bits k = 
(Ay,Ay,--- ,k UF ) when measuring an input state pj G C can now be expressed along the 
lines detailed in Appendix [Cl i.e. 


7-0 


(u) 


| /yO 

I,null i,0 


jv fei,fc 2 ,-,fe u _i,i LU 


P(%) = Tr[^pd, 


(39) 


with F^ defined as 




N, 


Of) 


kl,k 2 ,--- ,fc„ 


AT, 


(uf — 1) 


^1 5^2 5 *” 


<W<’ 


(40) 


with th e operators which define the POVM’s of the 

protocol. The success probability of the procedure follows then from this expression by 
simply setting k to coincide with the binary string associated with the selected j, e.g. in 
the case of the A'-th codeword 


Psucc{£) = P(k {£) \pp)) = Tr [F m pj (e) ], 


(41) 


where k^ is the binary string corresponding to the index A which defines the selected vec¬ 
tor jW. 

B. Attainability of the Holevo Bound via a bisection protocol 


As detailed in the previous paragraphs a bisection protocol aimed to decode a N = 2 nB - 
codewords quantum code C is defined by assigning a family of three-outcomes POVM’s 
identified by the integer index u G {1, - ■ • , up — nR} and by the binary labels 
Ay, • • • ,k u _i G {0,1} and concatenated as schematically shown in Fig. [lj In this section 
we are going to show that, as long as the rate R respects the inequality (J6|, it is possi¬ 
ble to assign the operators {A^.. fcu }ne{i,---,fc u _ie{o,i} which define the measures 
,u F =nR};k i,-,fc u _ie{o,i} in such a way that, in the limit of large n, the cor¬ 


responding success probability (41) converges asymptotically to 1 when averaged over all 
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possible codes generated by the output source £, i.e. 


lim { p tmc (f))s = 1 ■ 


( 42 ) 


Accordingly the corresponding average error probability (10) asymptotically nullifies, prov¬ 
ing hence that bisection decoding procedures can be used to saturate the Holevo bound. In 
order to achieve this goal we start by presenting a sufficient condition on ku which, if 

fulfilled, would yield the limit (42) independently of the value of R, see Theorem [Tj Subse¬ 
quently we show that for all rates R respecting the Holevo Bound (|6]) we can indeed fulfil such 
sufficient condition. This is done by presenting three independent choices of the operators 
,u F =nR},k i,---,fc„_ie{o,i}; corresponding to three different ways of constructing 


the bisection scheme: via orthogonal projections (see Sec. IVB 1); via PGM detections (see 


Sec. IVB 2); and via sequential detections (see Sec. IVB 3). 

Theorem 1 (Sufficient Condition). For n integer let C be a quantum code formed by N = 
2 nR separable codewords |fj) of length n extracted from the output ensemble £, and a bisection 
protocol with POVM’s ,u F =nR}-,ki, - ,fc u -ie{o,i} characterized by the operators 

{Nkf... k u }ue{i,- ,u F =nR};ki,- ,k u -ie{o,i}- The corresponding success probability (fl) converges 
to one as in Eq. (pfy when averaged over all possible codes generated by the output ensemble 
£ if, for all i G {1, • • • , N} and u G {1, • • • ,up}, one of the following conditions is fulfilled 

i) 

w PjW ) >1 -e(n); (43) 


Tr 


N 


( U ) 


,W 


• ,k. 


a) 


Tr 


N 


(u) 

.W 


• ,k. 


V) 


PPj(() P 


> 1 - e(n), 


(44) 


with e(n) > 0 being a function that decreases asymptotically to zero faster than 1/n 2 as n 
goes to infinity and P being the projector on the typical subspace of the average codeword 
associated with the output ensemble £ (in the above expression k[ e \ ■ ■ ■ , kip are the first u 
elements of the binary string k^ which represents the codeword index £ that labels the density 
matrix p^c)). 


Proof. We start by directly proving that Eq. (43) is a sufficient condition for Eq. (42), part 


i) of the theorem. Part ii) of the theorem is then obtained by showing that Eq. (44) implies 


Eq. (43). 
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Part i): The success probability (|5|) that an element G C will be correctly decoded by 




the bisection procedure characterized by the operators {A^“_ ku } u ^{\.... , UF =nR}\ku- ,fc u _ie{o,i} 


can be computed as in Eq. (41) with being the identifying bit string that Bob has 


assigned to the Ath codeword. To put a bound on the average value of this quantity over 
the collection S of quantum codes emitted by the source £, we observe that 

(psucc(£)^ s = ^Tr M 2 Uv M Up _i ■ ■ ■ Mi pj w Mi ■ ■ ■ M Up _ x ^ 

> {^Tr M'^.pju) — 2D ^M Up _i... M\p^ t) M\ ... M Up _i, Pjm) } 5 ’ (^5) 


where for easiness of notation we introduced M„ = , N 


r( u ) 


(<) (<) and applied Lemma 


k K 1 ... 

"'l 5 


with 


E = M 2 p , p = M Up _i • • • Mi pj ( £) Mi • • • M u F _i, and a = pj ( p. By use of the triangular 
inequality we also observe that 

D ^M Up _i... Mipj^Mi... M Up _i, pj(o^ 

< D ^M Up _ipj(f)M Up _i, Pj(o^ + -D ^M Up _i... M\pj( t )M\ ... M Up _i, M Up _ipj(c)M Up _i^ 

< H ^M Up _ipj w M Up _i, pj(o^ + D (m up _ 2 ■ • • MiPj (<! )Mi ... M Up _ 2 , pj(o j 

Itp - 1 

< ^ D (j^upju) M u , pj w j , (46) 

U=1 

where the second inequality follows from Lemma [4] while the third one by direct iteration of 


the previous passages. Replaced into Eq. (45) this finally yields 

lip - 1 


> (Tr M% F pj W 


2 E ( D (v„Pj ( «,M„,pj M )) (47) 




Assume now that Eq. (43) holds. Accordingly for all m e {1, • • • , ii F }, ^ = 1, - - • , N we have 


(Tr > 1 - e(n), 


(48) 


with e(n) being a positive function which goes to zero faster than 1/n 2 . Then thanks to 
Lemma [2] we can write 


Psuccif)) > 1 - e(n) - 2 nRy/e(n), 
/ <s 


(49) 


which forces ( p S ucc{E)j to converge to 1 as n —> oo. This shows that Eq. (43) is indeed a 


sufficient condition for (42). 
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Part ii): To prove that Eq. (44) is a sufficient condition for (42) we invoke Lemma [I] with 
E = M%, p = pj(i) and a = Ppj (e) P obtaining the following inequality 


Tr 


M 2 uPj 


\t) 


> (Tr 


M u PpjV)P 


2-D (p- w ,Pp- w P). 


From Eq. (16) we also know that for n sufficiently large and e\ = 0(e n ) one has 

= Tr [Pp® n ] > 1 - ei , 


Tr 


Pp: 


'w 


= Tr 


Pip 


7W 


s J 


(50) 


(51) 


where we used the fact that the average over S of the £-th codeword corresponds to the 
average with respect to the joint probability (|8j) of pj, i.e. 

(52) 


PiW ) - ~ 


'Em=p m - 


Accordingly via Lemma |2] we can conclude that 

D (f>jW, p PjwP)) s < 


which inserted in Eq. (50) finally yields 


(Tr [M„V]) > (Tr [m„ 2 P Pjm p] - 2 ^T h 


(53) 


(54) 


which thanks to (44) implies the inequality (43) and hence, via part i) of the theorem, 
Eq. d42b. □ 


Thanks to Theorem [T| we can now prove that bisection protocols allows one to attain the 
Holevo bound, by showing that, for all rates R respecting (J6|, it is possible to identify opera¬ 


tors { A r £ ..,fe,}i*E{i,~.,k„-i£{ 0 ,i} fulfilling Eq. (|44h. Ideally one way of building the 


POVMs A4^y ... ku _ l l which define the bisection decoding procedure, would be to identify its 
elements N ^\ 2 ... ku i 0 , ... ku _ i , with the projectors on the subspaces spanned by the 

codewords of sets C k ^ k2 ... ku -1,0 and C £fe2,-,*»-1 j respectively. This is not possible however 
due to the fact that such spaces are in general not orthogonal, though we expect typical 
subspaces of different codewords of the source to be disjoint in the long n limit: some kind 
of regularization is hence necessary. In the following we shall present three alternative, yet 
asymptotically equivalent, ways to realize this: the first makes use of orthogonal projections 
on subspaces identified by treating asymmetrically the set C^ fc2 ... ku _ i 0 and C £w-- 
the second is based on the PGM construction, and finally the third makes use of the POVM 
elements of the sequential protocol of Refs. H31 - 1T51 
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1. Method 1: orthogonal projections 


Consider the set Cj^ fca ... ku _ ] 0 . For each one of its codewords pj w we can associate a typ- 


7(0 


III A 


ical subspace Ti{y P and a corresponding projector Pj (e) along the lines detailed in Sec 
Next we construct the subspace ■■■ o spanned by the vectors which can be written 


-?(£) 

as a direct sum of the elements of the H/, ?/ ,s of C k ' k ... k 0 , i.e. 


l(u) 


H 


(u) 

feijfca,— ,fc«—i ,0 


iec 


(“) 


® ( 55 ) 

) 

,k u _ i,0 

where the sum is performed over the is whose corresponding vector p^ t ) belongs to 


7(0 


the set >ku __ L 0 . By construction it follows that each one of the 'H ] typ associated 

to Cj^ fc2 ... fcu _ i 0 are proper subspaces of 'H k ^ k2 ... ku _ 1 0 . Accordingly, indicating with 
P S 2 ,...,fe„_i,o tlie Projector on ,k u -i,o we have that 


P, 


(u) 


k lt k 2 ,-,k u -i,0 ^ 


> /A 


(56) 


L 0 . Also due to the partial overlapping among the 7-Ll yp of CjA ; fc2 ... ku _ i 0 


(0 


t(“) 


the sum of the associated P ^ w s will in general be larger than P k '\. n ... ku l 0 , i.e. 


P, 


(u) 


< 


fcl,fc2,... ,k u - 1,0 — 


E 


p j«). 


(57) 


■fee 


(u) 


k\,k 2 ,--- ,k u _ i,0 

We define the orthogonal projections method for the bisection POVM by identifying 

> 0 ) 


u . „ with li0 and fcu _ 3 L t with its complementary counterpart, 


k\,k 2 ,--- ,fc u _i,0 

i.e. 


j\t( u ) p( u ) 

iv fci,fc 2 ,... ,fc u _i,0 • r k\,k 2 , - ,fe u _i, O’ 


jy( u ) ._ 


•= = 1 — 

k\)k 2P " ,k u —i,l ' '^k\,k 2P - ,k u — 1,0 ki,k 2 ,--- ,k u — 1,0‘ 


(58) 

(59) 


A couple of remarks are mandatory: 


i) notice that N k ^ k2 fcu _ i x does not coincide with the projector P ^\ 2 ... A , u _ 1 j on the 
subspace %^ k2 ... ku l \ formed by the direct sum of the typical subspaces Pf yp associ¬ 
ated with C k p , 2 ... k i ■ Notice also that, due to the partial overlapping of the typical 
subspaces of different codewords, in general we can neither establish an inequality 


similar to Eq. (56) which links N^... kuil and the Pj (e) of iku _ u i, nor fix an 


ordering between N^ M ... K i l and P^’ M ... ku l ^ 


(u) 
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ii) by construction the scheme we are analyzing here does not include the possibility of 
the null event described in the previous section. Indeed in this case we have 


N, 


( u ) 


fci,fc 2 ,- ,k u —i,null 


= 1 -N, 


(u) 


kiM,— ,k u - 1,0 


N, 


(u) 

ki,k 2 ,-- ,k v 




(60) 


The associated set POVM M ^ k2 ... ku i is thus a projective measurement which admits 
only two possible outcomes, k u = 0 and k u — 1. 


From Theorem [l] the asymptotic attainability of the Holevo bound with this procedure 


can be established by showing that Eq. (44) holds, i.e. explicitly 


Lemma 6. For all rates R satisfying the Holevo bound © the bisection scheme associated 
with operators {N^ ku } defined as in Eqs. (58), (59) fullfils the sufficient condition (44). 


Proof. Consider first the case with k u = 0. Given then a generic codeword ppt) of 


C Kfe»,o we can write 


Tr 


1 k[ e >4 e >,...,kW 1 ,o p i w 


= ( Tr 


)(“) 


P k>P ... k'*'. n p l w 


> (Tr 


PjW P' : 


\t) 


^PjTrfPjpj] > (Tr [Pjpjl -2D(pj,pf) > l-e 2 -2 JT u 


(61) 


where for easy of notation we set ppj) := Pppt) P and where we used the fact that taking the 


average with respect to the statistical collection S of Tr 


PjW P 7 


V) 


the average of Tr 


P ]P] 


with respect to pj, i.e. 


Tr 


PjU) P'r 


‘W 


Y.PJ Tr \ p jp 


is equivalent to taking 


(62) 


The hrst inequality of Eq. (61) follows from Eq. (56); the second inequality follows instead 


from applying Lemma JTj with E = Pj, p = pj and a = pj] while finally the third inequality 
follows both from the high probability of projecting codeword pj on its conditionally typical 


subspace (24) and from the same concept for the average codeword, together with Lemma 


[ 2 ] (as in ( 5TJ53 )), the parameters e\ and 62 being both exponentially small in n to guarantee 


the limit property (19). Equation (61) proves hence that Eq. (44) applies at least for the 


sets C 


(u) 

ki,k2,--- ,k t 


with k v = 0. 
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Take next k u — 1 and a generic codeword p^t, of cfffa,.,/.'.-, ,. In this case we have 


Tr 


N 


(“) 


M M M | PjW 


= (Tr 


> (Tr 


PjW 

Pjw 


Tr 


p (“) ^ 


J / 5 


- E ( Tr *>> p, 

t'^i 


w 


( 63 ) 


where the inequality follows from (57) plus adding all the remaining terms Pj(e>) associated 


with codewords having i' 7 ^ t. Observe then that from Eq. (Jl6|) we have 


Tr 


PjW 


J / s 


Ep? Tr N = Tr [o>®i s 1 - . 


(64) 


with ei being an exponentially small function of n. Furthermore for each term of the sum 
on the RHS of Eq. ((661) we have 


Tr 


PjVPPjW 

) s = Tr 

PJPJ' 

= Tr 

p® n p), 

^ l|p 0n LS^ Tr 



3,3 


3 3 

< 2 _n [ |S '0)-<5] 2 n [Z j PJ s M+s] — 2 _n hdPj’ftl) _2<5 ] 


(65) 

where the second inequality follows from typical subspaces’ properties ( 18p5 ) and where 
x({PjiPj}) is Holevo information ((7|) of the source S. Replacing (64) and (65) into Eq. (63) 
we arrive hence to 


Tr 


]\J( u ) 7j_ 

M M 1 PjW 

K 1 > k 2 ’ i K u- 1’ 1 


\ > 1 _ e _ 2 nR 2 -n [ x dPj.Pj})- 2i5 l 

/ 5 


which shows that as long as the rate R respects the Holevo bound (( 6 ]), i.e. 

R < x({Pj,Pj}) - 25, 


for some S > 0, Eq. (44) applies also for the sets Cj^ fc ... ku with k u = 1 . 


( 66 ) 

(67) 

□ 


The inequalities (61) and ( 66 ) prove that under the constraint (67) the proposed im¬ 
plementation of the bisection decoding scheme asymptotically attains the Holevo bound, 
yielding an average error probability which converges to zero in the limit of n —>• 00. 


2. Method 2: via PGM detections 

An alternative way to implement the bisection protocol is substituting the sequential set 
measurement N with one inspired by the Pretty Good Measurement (PGM), first introduced 
to demonstrate the achievability of the Holevo bound. 
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For each set C 


H 

ki,— ,K 


define the positive operator 


S\ 


(u) 

k\ ,k u 


i'e c 


Y. p j<">. < 68 ) 

i.e. the sum of projectors of all the codewords in that set . From the non-orthogonality of 


projectors and the completeness property (37) it follows 


o( u “1) _ c( u ) I q( u ) > 1 

J k 1 ,— ,k u - 1 °ki,— ,k u -i,0 ' °k!,— — 


(69) 


Thus we can build the u —th measurement to decide whether the word belongs to Cj^.„ 0 


or , by using the sum operators for these two sets, renormalized by the sum 


operator for Cjj* i; which contains both of them at the previous step: 


N, 


(u) 


where the inverse 


5, 


k\ ,••• ,ku 


(u-l) 


c( u —1) 

^ki ,••• ,k u —i 


1 - 1/2 


oO) 

°ki,- ,k u 


c(u-l) 
°ki ,••• 


1 - 1/2 


(70) 


- 1/2 


fci ... fcu _ i is meant to be computed only on the support of Sj^ 
(otherwise the operator is assumed to be null). In this way we obtain a proper set POVM, 
since the renormalization allows us to take into account the intersections between typical 
subspaces of different codewords, i.e. 


0 < N, 


(u) 


— &!,••• ,ku — 


< V 


c(“-1) 

^1" >k u —i 


-i/2 r 


q( u ) I c( u ) 
°k!,- ,0 “T °k ll - ,1 


c(“-l) 
°k 1 ,- ,k u -1 


-1/2 


< 1 . 


fceo.i 


As in the case discussed previously we can now show that for all R fulfilling the Holevo 
bound ([6]) the operators (70) satisfy the sufficient condition Eq. (44) of Theorem [I] 


Lemma 7. For all rates R satisfying the Holevo bound & the bisection scheme associated 
with operators {iV^... ku } defined as in Eq. (70) fullfils the sufficient condition (44). 


Proof. Observe that 


Tr 


/\r(“) n _ 

7V fcfy,..,fcfy P J (e) 


-IV / 

r 

-1/2 


-1/2 

1_ 

IV 

1 

c(“-!) 
k W ... k W 

’ ' K u-1_ 


q(u-l) 

° k w k w 

K 1 ’ 1 K U-1 

PjW 


J i S 


= (Tr 






(71) 


where the latter is the average success probability of recovering the 7-th codeword from the 
set C {u fi ) X) (t) while using a PGM strategy and is the corresponding POVM element. 

>■" >ki-i 
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Accordingly we can bound each of the terms on the RHS of Eq. (47) by exploiting the 
efficiency of the PGM protocol. Specifically, we employ the Hayashi-Nagaoka inequality 11 


1 — < 2 Qj w + 4 ^ P-. 


\i') 


(72) 




to write the average success probability as 


(ifctwy > (Tr 


PjW 


2 (Tr 




'(0 


4 EG, 




j(t’)PjW 


> 1 - e, - 2(e 2 + 2 vVT) - 4■ 2 ” R ■ 2 - nlx{{p - p ‘»- 2S \ 


(73) 

(74) 


where the last inequality follows from Eqs. (51) and (65) and the fact that 
'Tr 


PjwQ 


w 


= (Tr 


PjW 


- (Tr 


PjwPjw 


< 1 - ( Tr 


PjW Pj 


'(0 


1 -J2pj Tv \ p 1P. 


( Tr \ P jPj} - 2D (pp Pi) ) < + 2^1, 


(75) 


which is derived as in Eq.(61). Similarly to what we observed in Eq. ( 66 ) it then follows 


that if the rate R fulfills the constraint (67) for some 6 > 0, then for n sufficiently large one 
has that, for all u and £, 


Tr 


N 


(u) 


p l (e) 


> (Tr 




'w 


J / s 


>1-63, 


(76) 


with £3 = 0(e n ) being exponentially small in n and fulhlling the condition (19) showing 
hence that (44) is satisfied by the selected operators. □ 


3. Method 3: via sequential POVM 

Another way to regularize set-projection operators necessary to implement the bisec¬ 
tion scheme, is to make use of the sequential protocol for that set , but without gaining 
knowledge about the result of this subroutine. Accordingly the regularized set-projection 
operators will be implemented as a black box, applying the sequential decoding scheme to 
the set of codewords which appear inside that set , taking also into account failure in pro¬ 
jecting on the typical subspace of previous codewords, in the code ordering chosen by Bob, 
see Fig. [2] The resulting setting is clearly redundant as the vast majority of information 
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gathered via the sequential decoding is simply neglected in the process. Also, the same 
procedure is iterated every time a new bit of the bijective encoding has to be acquired, in¬ 
creasing hence the chances of deteriorating the transmitted codeword. Still, as we shall see in 
the following, the scheme is efficient enough to allow for the saturation of the Holevo bound. 

In order to formalize this construction, for each quantum codeword pjp) G C, we write 
its corresponding element of the sequential POVM 13 ^ as 

Ei = Pj( i) , 

Ei — Qj( i) • • • Qj(e-i)PjwQj(i-i) ■ ■ ■ QjWi £ > 2 , (77) 

where Pj is the projector on the typical subspace of codeword state pj and Qj = 1 — 
Pj its complementary. We remind that by construction these operators fulfill the proper 
normalization condition, 


0 < < 1 , (78) 

N 

0<Y, E t = l-E 0 <l, (79) 

i=\ 

and that, given a density matrix pj G C, the probability of recovering the codeword j* 7 ' is 
given by 

P, a ,0 m \p ] )=T>c[E e p-]. (80) 


Using this expression we can hence estimate the probability that pj m belongs to the set 
\ 2 ... ku by simply summing the above expression over all jP belonging to such set , i.e. 


P{P- 


V) 




,) = 




A 


seq 


(j V) \pj) = TrKr.fc2,-,fc u Pj] ’ 


(«) 


(81) 


where the sum is performed over the is whose corresponding vector pj (<) belongs to the set 

C fic 2 ,..y„> and where 


N, 


(u) 


,k u 


E 


(82) 


is the set-sequential-measurement associated with the set C ki \ 2 ... ku induced by the sequen¬ 
tial decoding POVM. Since for all u G ,up} and for all ki, k 2 , •••, k u _ i, the sets 
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c £!*a, and av^.14 are not overla PP in g ( see e.g. Eq- d36h) we have that 


iH 


N 


0<N, 


O) 


< 


- "’fcl,fc 2 ,-,fcu - XI N kl,k2,-,k - X E/i < 1 • 


( 83 ) 


fceo,i 


£=1 


which guarantees that the operators ifcu _ li0 , ^[“ ) fcai ... ifcu _ ljl , and ,k u ^i,null 

1 — X)fceo l ^kfk 2 ■■■ k f° rm a properly normalized POVM. 

Lemma 8. For all rates R satisfying the Holevo bound © the bisection scheme associated 


with operators k ^} defined as in Eq. (82) fullfils the sufficient condition (44). 

Proof First observe that each operator w is the sum of a certain number of sequen- 

I - " feu 

tial POVM elements, always containing the element E{ corresponding to the right codeword. 
Since all the operators in the sum are positive we can state 


Tr 


ivr( u ) 

' a-0,...PP p i w 


E 


f! pf(“) 

k W k W 


Tr 


E& pp) 


> (Tr 


EePp) 


(84) 


where in the last term we recognize the average success probability (80) of the sequential 


protocol computed on the subnormalized version pp) of the Oth codeword. Accordingly 


we can bound each of the terms on the RHS of Eq. (47) by exploiting the efficiency of the 


sequential protocol. Specifically by applying Sen’s Lemma [5] and using the concavity of the 
square root function we can write: 


EePjW ^ — (^Tr Pp)Qp -n • • • Qp) pp) Qp) ■ ■ ■ Qp-i)Pp) 

)s , (85) 

>( Tr M) s - 2 (^& 

pp) Qp > 

+ X Tr PjW p P') ) 

*** ' s 


pp ] Qp) 

ipt 

Pp)Pp') (86) 


having added under square root all the terms Pp') with i' > L The term outside the 


square-root can be treated as in (51). For the first term under square-root simply apply Eq. 


(75). For each term of the sum under square-root we can instead use the inequality (65). 


Therefore we can write 


^Tr E t pj w ^ > 1 - d - 2y4 2 + + 2 nR ■ 2-’4x(te,pR)-25] ; 


(87) 
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which, via Eq. (84) implies again that for rates R fulfilling Eq. (67) for some S > 0, then for 
n sufficiently large one has that, for all u and £, 


Tr 


N 


(■ u ) - 

p o {t) 


> (Tr 


EtP: 


'W 


> 1 - C 3 , 


( 88 ) 


with 63 = 0(e n ) being exponentially small in n and fulfilling the condition (19). This 


proves (44) and hence the asymptotic achievability of the Holevo bound with the bisection 
protocol defined by operators (82). □ 


V. CONCLUSIONS 

In this article we computed an upper bound for the average error probability (over all 
codewords in a code and over all possible codes) of the bisection decoding scheme. The 
bound is shown to approach zero exponentially fast with the codewords’ length, for any 
output ensemble £ whose size is strictly less than Thus we provided a new proof 

of the attainability of the Holevo bound for classical communication through a quantum 
channel for a class of decoding schemes based on the bisection method, whose complexity 
scales as the logarithm of the codewords’ length. An advantage of this protocol is the 
possibility of gaining a bit of information at each step of the procedure, unlike the sequential 
decoding, which gives either full or null information about the codeword at each step. This 
is particularly powerful in the case of failure at a certain step of the protocol, allowing the 
receiver to at least make use of the previous steps for a partial identification of the message. 
Note also that there is a certain degree of freedom in the implementation of the specific sets’ 
“yes-no” measurements, which form a complete POVM at each step, independently of the 
rest of the protocol, as long as their average error probability approaches zero faster than 
n~ 2 (e.g. exponentially decaying) as the codewords’ length n grows, for all sources respecting 
the Holevo bound. This fact has been shown by providing three different POVMs which 
satisfy the bound, employing projectors on typical subspaces and renormalizing for their non¬ 
orthogonality. Unfortunately, as in the case of polar coding, this general requirements on the 
“yes-no” set measurements do not allow one to evaluate their implementation complexity, 
thus leaving open the problem of determining a structured device for the efficient detection 
of long codewords. Eventually we stress the importance of the Chernoff bound to provide 
an exponential scaling to the small quantities used in describing the typical subspaces’ 
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properties, which in turn allows the convergence of the decoding scheme. 


Appendix A: The law of large numbers via Chernoff bound 


In this appendix we compute an exponential bound for the law of large numbers, which 
guarantees the convergence of the error probability of our protocol to zero. Indeed consider 
the small quantities e±, e 2 which appear in Sec. |III A[ These quantities describe the high 


probability of finding respectively the average state p® n and the codeword states pj in their 
typical subspaces, identified by the projectors P and Pj. This is why they are connected, 
through the classical typical subspaces, to the law of large numbers. 

Consider for example the average state p of the source. We can easily prove that the 
probability of n copies of the quantum state p are in its 5 —typical subspace, Tr [Pp ® n ], is 
equivalent to the probability of a random sample sequence x of the corresponding classical 
source being in the classical 5 —typical subspace, Pr(x G T^): 


Tr [Pp® n ] = Tr 


^ ' dz) (px ^ ^ Q_x' | ^x ') ( 

?e T" x' 

ceTf 3 ' 

Y, q s = Pr(x G 7X). 


x&TV 


A similar result is obtained for each codeword state pp namely 


Tr 


P JPj 


= Pr (f G T? 


(Al) 

(A2) 

(A3) 

(A4) 


These probabilities can be bounded from above with the help of the law of large numbers. 
Consider for example the average typical subspace and choose the random variable Z , taking 
values z = — log 2 q x - We also choose the same probability distribution both for X and Z , 
i.e. q z = q x . Then the law of large numbers states that, for any 5 > 0, the probability that 
the average of Z over n extractions, 

1 n 

-V ]zi = H(x), (A5) 

i =1 

i.e. the sum of n i.i.d. random variables, differs from its expected value, 

n 

= iJ(X), (A6) 

i =1 
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for more than 5 is lower than a small and positive quantity 1 e > 0, i.e. 


Pr(x G Tf) = Pr (| H{x) - H(X) \ > S) < e. 


(AT) 


In usual derivations of this result the Chebyshev inequality is exploited, which gives a scaling 
behaviour e ~ n -1 . This is not sufficient for convergence of the error probability to 0 for 


long sequences n —> oo in (49). Recalling also that the Chebyshev bound gives a dependence 
on the variance of the distribution, it is clear that such a scaling is a rough extimate, since 
the law of large numbers is known to be valid also for infinite-variance distributions. We 
therefore use the Chernoff bound to obtain a faster, indeed exponential, convergence. 
Consider first the Markov inequality, valid for any nonnegative random variable t > 0 and 
5 > 0 : 


Pr{t>5) = S ^ j pt<^ j Pt- 6 (A8) 

t>5 t>5 

= ^ ( A9 > 

t 

where we used a bar sign to indicate the average over the probability distribution of the 
random variable. The first inequality follows from introducing terms which certainly are 
less than one, given the constraint on the sum. The second inequality follows from adding 
positive terms to the sum, since the random variable is positive. We now choose t = e sw , with 
w a new random variable 7 and 5 = e sA , without loss of generality. The Markov inequality 
then reads 

Pr(e sw > e sA ) < e~ sA g w (s) (A10) 


for any s, A, where we called g w (s) = exp (sw) the moment generating function of the 
random variable w, i.e. 

rl n n ( s'! 

(AH) 


— d n g w (s ) 

w n = 


ds r - 


s =0 


Now observe that the above inequality between exponentials has two different meanings 
depending on the sign of s, implying both 


Pr(w > A) < e sA g w (s ) s > 0 
Pr(w < A) < e~ sA g w (s ) s < 0. 


(A12) 

(A13) 


These two relations give bounds on the tails of the w probability distribution. In order to 
evaluate how tight such bounds are, we consider the specific case of w being the sum of n 
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i.i.d random variables ay, implying for the moment generating function 


9w{s) = exp ( s^Xi ] = = (^(s)) n , 


i =1 


i= 1 


and take A = na, without loss of generality. The previous inequalities become 


Pr ( — Xi > a j < exp [—n (sa — In g x (s))] s > 0 


1=1 

n 


Pr | — Xi < a < exp [—n (sa — ln^ s (s))] s < 0. 

\ / ? I 


(A14) 


(A15) 

(A16) 


\ i =1 / 

We now need to evaluate the behaviour of the coefficient function in the exponential: 

h(s) = sa — ln^ x (s). (AIT) 

Consider first some properties of fi x (s) = ln^ x (s), following from the nature of the moment 
generating function: 

• fJ>x(s = 0) = 0, since g x (s = 0) = 1; 


• ii'Js = 0) = g' x (s = 0 )/g x (s = 0) = x, since g x (s = 0) = x; 

• it is convex 

\9x(s)J 

(x)l = ((x - (x) e f) e > 0 Vs, 


= 


!%(*) 


9x{s) 

= <- 2 > e 


(A18) 

(A19) 


where we have indicated with 


(f(x)) e 


f(x)e sx 


the probability average with weight e sx . 


(A20) 


From the previous properties, it follows that the slope of the function, starting at x at the 
origin, increases for s > 0 and decreases for s < 0. Expanding /jl x (s ) for small s at second 
order, we have for the coefficient function 

h(s)^(a-x)s- S —^( 0). (A21) 
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This approximate function (for small s) is zero at 


s = 


2 (a — x) 

K(o) ' 


(A22) 


Consider now the s > 0 inequality (A15). If a > x, then the zero s* is positive and inside 


the range of validity of the inequality. Thus h(s) > 0 for all s < s* in the range: the first 
inequality has a tight bound. Vice versa if a < x, the zero s* is negative and h(s) < 0 in 
the whole range of validity of the first inequality, making it useless. 


The situation is reversed when considering the s < 0 inequality (A16). In this case we need 


s* < 0, i.e. a < x, and for any s > s* in the range the coefficient function will be positive 
again, providing a tight bound for the second inequality. By calling 


h p = sup h(s), h m = sup h(s), h p , h m > 0, 

s>0 s<0 


(A23) 


the supremum of h(s) in each region, we can thus rewrite the inequalities as tight bounds, 
taking respectively a — x + 8 > x in the first inequality and a — x — 8 < x, in the second 
one, for any 8 > 0: 


Pr — 


n 


Xi — x > 8 I < e 


—nh v 


Pr — 


n 


2—1 


E 

2=1 


Xi 


x < — h < e 


—nh n 


(A24) 

(A25) 


Eventually we sum the previous inequalities to obtain the law of large numbers with expo¬ 
nentially decreasing tails 


Pr 


n 




X 


i =1 


> 8 < e 


—nh p _|_ 


= 0(e~ n ) = e. 


(A26) 


Observe that the small quantity e > 0 obtained in this way, exponentially decreasing with 
increasing n, also depends on the difference parameter 8 that we chose, as of course is to be 
expected. Indeed this dependence is implicit in the definition of h p ,h m \ by choosing 8, we 
set different values of a (for both the s > 0 and s < 0 cases) and this in turn varies the point 


s* (A22), i.e. the range of values of s (s < s* or s > s*) which can be chosen to maximize 


the coefficient functions. In particular, since the expression (A21) is a small-s expansion, 


we do not know what the absolute supremum of h(s) is and where it is located 7 . Thus by 
varying the range of s accessible through the tuning of 8, we may happen to exclude this 









and other local supremum points, resulting in (possibly discontinuously) varying values of 

h p ,hm- 

In any case, for our purpose we need only the existence of a range of values s, both above 
and below zero and depending on S, where the coefficient function h(s) is positive, and this 
is guaranteed by the properties of the fj, x (s) function, respectively when a > x for positive 
s and when a < x for negative s. 


Appendix B: Proofs of Lemmas 


We give here the proofs of the remaining Lemmas of Section III B 


Proof of Lemma [3] For a hermitian operator we can always write c <j = A — B, where A, B 
are positive matrices with disjoint supports, representing w respectively in the positive and 
negative part of its support. Consider then the operator A = II A — hi B , with hffi and II B 
being projectors respectively on the support of A and of B. For this operator we can clearly 
state that —1 < A < 1 , i.e. for all vectors |u) we have 


(v|(A — l)|u) < 0 (Bl) 

(v\(A + l)|u) > 0. (B2) 


By construction we obtain thus an operator which saturates the bound (30): 


Tr [Aw] 


Tr [(n 4 - n B ) A] - Tr [(II A - n B ) B] (B3) 

Tr[A} + Tr[B]=Tr\u\ = \\u\\ 1 . (B4) 


In order to complete the proof, we need to show that A is the maximizing operator among 
all possible —1 < A < 1. First observe, by diagonalising A and B, that 

Tr [A A] = ^ afc( a fc|A|afc) < ^ a k {a k \a k ) = Tr [A] (B5) 

k k 

Tr [AB] = J2Pk(b k \A\b k ) > = - Tr [B]. (B6) 

k k 

Thus 


Tr [Aw] = Tr [AA] - Tr [AB] < Tr [A] + Tr [B] = Tr\u\ = Hw^ . (B7) 

□ 
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Proof of Lemma [1} Consider that 

2 D(p,a) = ||p-<r||, = maXjTrlAfo- - p)] (B8) 

> Tr [E(a - p )\, (B9) 

which follows from applying Lemma [3] and from the fact that 0 < E < 1 surely is one of the 


operators included in the maximization procedure. The result (27) is then easily obtained 
by separating the trace and rearranging terms in the previous inequality. □ 

Proof of Lemma [I| Consider that 

(BIO) 


2D (y/Epy/E, p) = 

p — \[Ep\fE 

< 

i 

p — Ve p 

+ 

1 



= 

(l-VE 

)Vr> 


(Bll) 

thanks to the triangular inequality for the trace distance. Now for the first term write yfp 
in diagonal form {\/Afc, \fk)} and use again the triangular inequality for the trace norm: 


l-vTjvp^yvi/tx/t 


< 
i k 


X;^|(l-VB)yp|AXA||| (B12) 


V VKTrfiffuffp (l - Vb) 2 vaAXA 


“ETvCaIVp(i-Ve) Vp|/*>. 

k ' 

Apply then the Cauchy-Schwarz inequality 

\x- y \ 2 < W \ 2 ■ |a| 2 , 

with x k = •J\ k and y k = Jp (l - v7-) fp\ / t ), to obtain 


(B13) 

(B14) 


(B15) 


i - Ve) VpE v^IAXA 


< VpI/j) (B16) 

< t/Tr p(i-/e) 2 , (B17) 
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where we used the fact that Tr [p] = J2 k — 1- For the second term in (Bll) write instead 
\[E in its diagonal form |e^)} and proceed in a similar way as before: 


E V^k\ e k)(ek\p (l - ^/E) 


<E^ |e fc )(e fc |p (l - \fE 




E) p\e k )(e k \ 


E ( e k\p f 1 “V#) p\e k ) 





(B18) 

(B19) 

(B20) 

(B21) 

(B22) 

(B23) 


where we used the triangular inequality, the invariance of the trace norm under hermitian 
conjugation, the Cauchy-Schwarz inequality, the fact that Tr [ E} = u k < 1 and the 


property p 2 < p < 1. The inequality (Bll) then simply becomes 


2D (VEpVE,p) < 2 a Tr 


P 


1-y/E 


< 2VTr HI - E)], 


(B24) 

(B25) 


since 


0 < E < VE < 1 

(l - Ve\ = 1 + E- 2 y/E < 1 - E. 


(B26) 

(B27) 


Eventually we take the code average of (B25) and use the concavity of the square-root 


function and the hypotesis (28) to obtain the thesis (29): 


2D [VEpVE.p^ < 2a/ (Tr [p (1 — E)}) < 2Vi. 


(B28) 


□ 
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Appendix C: Derivation of the bisection POVM 


Here we provide an explicit derivation of the POVM (40) associated with our bisection 


protocol. We consider each step to be carried out as a unitary process on an enlarged 
system, consisting of the state |T) received by Bob (we take it pure for simplicity) and 
various ancillae, one for each step. The ancillae start in a reference state |a) and will turn 
into one of three possible states depending on the result of the measurement. In particular 
at the u —th step the ancilla state |0) (|1)) corresponds to having found the codeword in set 
Cfc’f o (Cjvf fcu _ 1 while the state | null) corresponds to failure. 

We start by applying the first-step POVM _M (1) = {N^\ N^\ N^ u }: 

u m (|*)|o )0 = ^|,)|0)1 + v / ivbl«')|i ) 1 + VVbWMOi- (Cl) 

After the second step POVM AT 2 -* we obtain the state 


t/< 2 > (c< i >(|i'}| a ) 1 )| a } 2 ) = \ / V?\/V’i' I '>i 0 >ii 0 >2 + \/V’)/V’i' I '>i 0 >ii 1 > 2 

+ v / iv!?v / V’i' I '>i 1 >ii 0 > 2 + vV? v / V’i' t >i 1 >ii 1 )2 

+ ^l*)l-«)i| 0 ), (C2) 


If we stop at this step, the probability of having found |T) in a given set , e.g. is 


p(ioipn=(*i\ApA r S l v / V’i < i')='ii' 




(03) 


which corresponds to equation (40) for k — (1, 0) and can be easily generalized to an arbitrary 
number of steps. 
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incoming 

codeword 



FIG. 1. Schematic representation of the bisection decoding procedure. It consists in a sequence 

of adaptive measurements which are performed in series of u F concatenated steps, each being 

characterized by a POVM (the white circles) which admits three possible outcomes: two being 

associated respectively to the identification of the corresponding bit as 0 or 1, and one, the null 

outcome, associated with the event where no decision can be made on the value of the bit. The 

POVM to be performed at the w-th step depends upon the value of the bit obtained at the previous 

( 2 ) 

ones: for instance at the step number 2 Bob will perform either the POVM Mq or the POVM 
( 2 ) 

M\ depending on the value of k\ he has obtained at the first step of the procedure, while at the 

( 3 ) ( 2 ) ( 3 ) ( 2 ) 

step number 3 Bob will perform the POVMs Mq 0 , -Mpi , , or depending on the values 

of k\ and k -2 obtained in the previous two steps. The figure refers to the case of u F = 3, the redline 
representing the trajectory which yields Bob to assign the binary string k = (0,0,1) to the received 
codeword. 
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ki = 0 
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^.1 null 



FIG. 2. Schematic representation of the set POVMs _A/fg 2 \ and in terms of the se¬ 

quential POVM decoding procedure (little square elements) for N = 8 codewords. Panel (a): 
implementation of . The red color of the square blocks indicates that all the elements of the 
sequential decoding POVM are active: their outcomes are used to determine whether the incom¬ 
ing codeword belongs to the subset Cq 1 * (first four codewords), or to the subset C^ 1 ' (last four 
codewords) fixing the value of k\. The rectangular elements of the figure indicate that no other 
information is extracted from the outcomes of the sequential measurement. Panel (b): implemen- 
tation of Adg which discriminates between the subsets Cq 0 and Cg)\ This element operates on 
the state emerging from the port k\ = 0 of see e.g. Fig. [l] As indicated by the color, only 

the first elements of the sequential POVM are active, while the outputs of the remaining ones are 

f2) (2) 

equivalent to the null result. Panel (c): implementation of Af) which discriminates among C)g 

and 
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